For Google Colab Users

If you are running this notebook on Google Colab, please make sure to do two things: **First**, switch the runtime to a GPU instance. This can be done by clicking on Google Colab on the `Runtime` tab and select the option `Change runtime type`. In the appearing popup window, leave the `Runtime type` on Python 3, but select for `Hardware accelerator` the option **GPU**. **Second**, execute the following code cell to prepare the notebook environment and its dependencies.
In [1]:
import sys
if 'google.colab' in sys.modules:
    
    # Clone GitHub repository
    !git clone https://github.com/epfl-exts/amdl20-image-classification.git
        
    # Copy files required to run the code
    !cp -r 'amld20_classification/downloads' 'amld20_classification/utils.py' .
    
    # Install packages via pip
    !pip install -r "amld20_classification/colab-requirements.txt"
    
    # Restart Runtime
    import os
    os.kill(os.getpid(), 9)

 Image Classification

The goal of this hands-on exercise is to provide a general overview of image classifiation and to show you how you can train a model and later use it to classify images according to your own categories.

The whole process is subdivided into 4 sections:

  1. Data Preparation: Collect the data, clean it and prepare it for further analysis.
  1. Data Exploration: Explore the dataset and modify it to your need.
  1. Data Modeling and Analysis: Model creation, optimization & evaluation.
  1. Results Discussion & Exploration: Investigate model performance and understand results.
Note: The full code to this hands-on exercise can be found under github.com/epfl-exts/amdl20-image-classification.

Preparation of the programming environment

First things first, let's initiate a few libraries that we need during our analyis. The exact details of these statements are not important, but in short: We're preparing all the functions we would like to use and make sure that the plotting of the figures looks nice.

In [2]:
import sys
if 'google.colab' in sys.modules:
    %tensorflow_version 2.x
%run utils.py
%matplotlib inline

1. Data Preparation

The goal of this section is to (1) define the categories we want to predict, and to make sure that (2) the data is collected, (2) cleaned and (3) loaded into a useful and accessible data format.

1.1. Define Class Labels

Task 1 (after completing the exercise at least once)

Feel free to change the list of class labels below as you like. For this hands-on exercise it is recommended to have something between 3-8 classes. **However**, each new class will take some time to be downloaded! Therefor, to provide a smooth workshop experience, I recommend to keep the class label list as it is and only explore different classes in at a later stage.
In [3]:
# List of class labels to use for the image classification
class_labels = [
               'brown bear',
               'polar bear',
               'giant panda',
               'red panda',
               'lion',
               'tiger',
               'racoon',
               'red fox',
               ]

1.2. Collect Dataset

In an ideal case you would collect the data yourself, make sure that it is of high quality and can be used for an image classification project. In the scope of this hands-on exercise we don't have the time for that so we will take a few short cuts.

First, we will use the python package Google Images Download to download images of our target classes directly via google's image search service. However, this library only allows the download of max. 100 images at once. To bypass this restriction we will look for our class labels, multiple times, each time with an additional keyword.

So, instead just looking for images of a "brown bear", we can look for "brown bear close up" and "brown bear portrait" images. This is exactly what we will do with the parameter search_suffix below.

In [4]:
# Define additional search temrs to expand the number of searches
search_suffix = ',close up,portrait'

Once the class labels and the additional search terms are defined, we can go ahead and collect the images from the web.

In [5]:
# Download the dataset and store the images on disk
imgs_raw = collect_images(class_labels, suffix=search_suffix)
Collecting data about: brown_bear
Collecting data about: polar_bear
Collecting data about: giant_panda
Collecting data about: red_panda
Collecting data about: lion
Collecting data about: tiger
Collecting data about: racoon
Collecting data about: red_fox
---
A Total of N=3624 images were collected!

Once all the images are downloaded and the file names are stored in the imgs_raw variable, we can go ahead and take a look at a few of them.

In [6]:
# Let's take a look at the data we've collected
plot_images(imgs_raw, n_col=10, n_row=5)
**Note**: All the images above are squared, i.e. have the same number of pixels in width and height. This is done on purpose, to make all photos "equal" and comparable. This restriction means that some of the images that originally were rectangular, were cropped to a squared shape - which explains why some of the images seem to be cutting of the imporant part of the image.

1.3. Clean Dataset

Almost all datasets are to a certain degree "dirty" and need to be cleaned. Cleaning in the context of classification means to:

  • remove duplicate samples
  • remove outliers or extreme values that are not part of the target class
  • hanlde missing values

We don't have any missing values in these dataset (images are either downloaded or not), and we will therefore only focus on the first two points.

 Remove Duplicates

In [7]:
# Remove duplicates from the dataset
imgs_unique = remove_duplicates(imgs_raw)
Total number of images in the dataset: 3624
Number of duplicates in the dataset:    148

 Remove Outliers

The removal of outliers is a bit more tricky when it comes to images. Ideally you would do this by hand and make sure that you only have meaningful images in your dataset, which only correspond to the chosen class labels. For this hands-on exercise, a manual check-up is not possible, so let's try a shortcut.

Each image has a particular color profile, i.e. distribution of red, green and blue (RGB) pixel values. Looking at these particular RGB value distributions might allow us to differentiate photos of animals from photoshopped images, gray colored ones or graphical logos. Let's take a closer look at a few examples.

In [8]:
# Plot images together with their RGB color distributions
plot_images(imgs_unique, n_col=8, n_row=2, show_histogram=True)

To remove outlier images from the dataset, we will take a look at the RGB color profile of each image and remove images that either are only in gray colors or that have sudden extreme spikes in the RGB profile (i.e. images that have huge amounts of pixels with the exact same color value).

In [9]:
# Remove outliers from the dataset
imgs_clean, imgs_outlier = remove_outliers(imgs_unique)
Total number of images in the dataset: 3476
Number of outliers in the dataset:      239

Now that we separated the dataset into clean and outlier images, let's take a closer look at them:

In [10]:
# Plot some outlier images
plot_images(imgs_outlier, n_col=10, n_row=5)
In [11]:
# Plot some clean images
plot_images(imgs_clean, n_col=10, n_row=5)

1.4. Finalize Dataset

As a last step in the data preparation section, we need to finalize the actual dataset and store the relevant image information into useful variables.

Task 2 (after completing the exercise at least once)

The image dimension of 32 was chosen as a trade-off between speed and detail. To better understand the effect of this trade-off we recommend you to explore different values for img_dim=32. What happens if you set this number to 16, 64 or even 128? **Note**: Changing this parameter will also have an effect on the computation time. For this reason we recommend that during the first walk through of this hands-on that you leave this parameter at img_dim=32.
In [12]:
# Create dataset
X_pixel, y_pixel, metainfo = create_dataset(imgs_clean, class_labels, img_dim=32)
Found 3237 images belonging to 8 classes.

Dataset was augmented to a total of N=6474 images through means of:
Image rotation, flipping, shifting, zooming and brightness variation.

Images resampled to a resolution of 32 x 32!
**Note 1:** In machine learning, the variable X is usually used as the dataframe, i.e. the feature matrix and the variable y is used as the label for the target. Here we've added an additional variable called metainfo that contains relevant information about the dataset (e.g. size of dataset, class labels, image filenames, etc.) which are needed for the code further below. **Note 2:** The function create_dataset above also contains the parameter img_dim=32, which specified to which pixel resolution the images should be resampled. In this case, 32 means that all images are resized to a pixel resolution of 32 x 32. Downsampling images from their original size to a size of 32 x 32 reduces the computation time, and makes sure that all images have the same size.

2. Data Exploration

Once the dataset is ready to use, it is important that we make sure that we understand its properties. This can be done by doing something called an Explorative Data Analysis or short EDA. Each of these EDA steps should lead to an observation and a decision.

2.1. Data Description

So, first things first, let's take a closer look at our dataset and describe it's basic properties.

In [13]:
# How many images per class do we have?
plot_class_distribution(y_pixel, metainfo)

Observation: The classes are not equally distributed. Some classes have more than twice as many images than others.
Decision: The unbalanced occurence of the different classes needs to be considered during the data modeling process.

In [14]:
# What do the average images of each class look like?
plot_class_average(X_pixel, y_pixel, metainfo)

Observation: Some classes like giant_panda, polar_bear and red_panda have very unique color characteristics.
Decision: Let's take a closer look at the RGB color distribution of the average images.

In [15]:
# What does the average RGB color profile look like per class?
plot_class_RGB(X_pixel, y_pixel, metainfo)

Observation: The individual RGB color distributions between almost all classes looks different.
Decision: We should use the RGB color profile as a data feature for the image classification.

2.2. Feature Engineering

From the previous section, we know that the RGB color profile of each image might be useful to do the image classification. For this reason, let's take the first dataset X_pixel and create a second dataset called X_rgb that consists only of the RGB color profiles.

In [16]:
# Extract RGB color profiles for each image individually
X_rgb, y_rgb = extract_RGB_features(y_pixel, metainfo)

In addition to using the RGB color profile as new data features, let's also try to extract meaningful features from the images by using a state of the art neural network called MobileNet. Note, the neural network that we will use for this feature extraction will be downloaded directly from Tensorflow Hub.

In [17]:
# Extract features according to MobileNet (Neural Network)
X_nn, y_nn = extract_neural_network_features()
Building model.
Found 3237 images belonging to 8 classes.
Extracting features.
Dataset was augmented to a total of N=6474 images through means of:
Image rotation, flipping, shifting, zooming and brightness variation.

2.3 Recap before Data Modeling

Before we continue with the next section, let's quickly recap the three different ways in which we've represented our images:

  • X_pixel: This dataset contains the original data in its raw form. Each value is a RGB pixel value in the original image.
  • X_rgb: This dataset contains the RGB color profile of each image.
  • X_nn: This dataset contains an N-dimensional feature vectore per image, extracted with the neural network structure, called MobileNet.

To better understand what this means, let's plot an image in these three representations:

In [18]:
plot_recap(X_pixel, X_rgb, X_nn)

3. Modeling and Analysis

Once the data is (1) collected, cleaned, and prepared and (2) meaningfull features are prepared - we are ready for the data modeling. We will optimize the model fitting once for each kind of dataset, i.e. X_pixel, X_rgb and X_nn.

**More indepth information about the classifier:** The model used in this hands-on example is a [Ridge Regression Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html). Ridge regression has the advantage of being rather quick, while also allowing us the fine tuning of a regularization parameter alpha for optimal data fitting. During model optimization, we will be using a crossvalidation approach with 4 folds to fine tune the hyperparameter. To allow to test for generalization, the dataset is initially split into a training and test set according to a 50/50 ratio. Because of the unequal distribution of the class labels throughout the dataset a weighted balancing approach during model optimization is applied.

3.1. Model Fitting on X_pixel

In [19]:
# Train a model fit to the data in pixel format
model_pixel = model_fit(X_pixel, y_pixel)
Fitting 4 folds for each of 25 candidates, totalling 100 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   24.6s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:   38.7s finished
Model trained for 39.46s total and reached an accuracy of: 42.94%
In [20]:
# Investigation of the model performance
check_model_performance(model_pixel, metainfo)

3.2. Model Fitting on X_rgb

In [21]:
# Train a model fit to the data in rgb format
model_rgb = model_fit(X_rgb, y_rgb)
Fitting 4 folds for each of 25 candidates, totalling 100 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    3.8s finished
Model trained for 3.95s total and reached an accuracy of: 35.19%
In [22]:
# Investigation of the model performance
check_model_performance(model_rgb, metainfo)

3.3. Model Fitting on X_nn

In [23]:
# Train a model fit to the data in neural network format
model_nn = model_fit(X_nn, y_nn)
Fitting 4 folds for each of 25 candidates, totalling 100 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    2.6s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    6.3s finished
Model trained for 6.42s total and reached an accuracy of: 99.26%
In [24]:
# Investigation of the model performance
check_model_performance(model_nn, metainfo)

4. Results Discussion & Exploration

Once the model was optimized and the general model performance was investigated, we should take a closer look at the results. Why did the model predict a given class? With what certainty was this decision made? Which images did the model missclassify and why? Which images are easy for the model to classify? ...

This are all questions we should investigate, once the model is trained.

4.1. Investigation of Model Performance

Task 3

The `model_nn` should lead to the best results. Nonetheless, try to investigate the model performance for the `model_pixel` and `model_rgb` as well by changing the variable name in the following cell.
In [25]:
# Choose which model to investigate: 'model_pixel', 'model_rgb' or 'model_nn'
model = model_nn

Let's see which images the model did correctly classify and how certain it was:

In [26]:
# Plot correct predictions
investigate_predictions(model, metainfo, show_correct=True, nimg=7)

Let's see which images the model did not correctly classify and how wrong it was:

In [27]:
# Plot wrong predictions
investigate_predictions(model, metainfo, show_correct=False, nimg=7)

4.2. Use pretrained model to predict class probabilities on a new image from the web

The training of the model can be very computational intensive. But once the model is trained, we can use it to very quickly predict the class properties of any other image we give it.

So, let's take an image from the web and see how well the trained model performs on a new image!

In [28]:
# Predict the class properties of an image from the web
img_url = 'https://c402277.ssl.cf1.rackcdn.com/photos/18134/images/hero_small/Medium_WW226365.jpg?1574452099'
predict_new_image(img_url, model_nn, metainfo)
Downloading image.
Feature Extraction.
Plotting report.

What about an image that is only slightly related to the target classes?

In [29]:
# Predict the class properties of an image from the web
img_url = 'https://live.staticflickr.com/7279/7017467025_8807cc82f6_b.jpg'
predict_new_image(img_url, model_nn, metainfo)
Downloading image.
Feature Extraction.
Plotting report.

What about me? What kind of an animal am I?

In [30]:
# Predict the class properties of another image from the web
img_url = 'https://d33wubrfki0l68.cloudfront.net/067d6b185769f031404e927b1b70de6d5ece3e0a/d82f4/michael.1164620e.jpg'
predict_new_image(img_url, model_nn, metainfo)
Downloading image.
Feature Extraction.
Plotting report.

Task 4

Choose your own image from the internet, indicate the URL link to the image under `img_url` and run the following cell.
In [ ]:
img_url = 'www.YOUR_OWN_IMAGE.jpg'
predict_new_image(img_url, model_nn, metainfo)